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Abstract - This paper discusses the motivation, opportunities, and prob- 
lems associated with implementing digital logic at very low voltages, including 
the challenge of making use of the available real estate in 3D multichip mod- 
ules, energy requirements of very large neural networks, energy optimization 
metrics and their impact on system design, modeling problems, circuit design 
constraints, possible fabrication process modifications to improve performance, 
and barriers to practical implementation. 

1 Introduction 

As technology continues to scale into the submicron regime, massively parallel architec- 
tures are increasingly being constrained by power considerations. Minimizing the energy 
per operation throughout the system is assuming increasing importance. We are investi- 
gating “Ultra Low Power CMOS” to reduce the energy per operation in massively parallel 
signal processors, microsatellites, and large scale neural networks. We are investigating 
operating with supply and threshold voltages of a few hundred millivolts to reduce energy 
per operation by a more than 100 times. 

In this paper, we show that minimum energy per operation is achieved in the sub- 
threshold regime, and that the optimum performance is obtained when Vdd = V t and 
Gnd = Vt — Vdd. We also show that minimum energy X time occurs when Vdd = ZV t . We 
show that V t should be chosen such that I on /I 0 fr — Id/ a, where Id is the logic depth and a 
is the activity ratio, the fraction of gates which are switching at any given time. We also 
show that Id = 11 minimizes energy in a 32x32 bit parallel multiplier. 

2 Motivation 

The application domains we axe targeting include wideband spectrometers requiring 10 12 
operations per second, microsatellites with lOOmW power budgets, large scale neural net- 
works requiring 10 15 connections per second and lfj per connection, and small, massively 
parallel digital signal coprocessors. 

As an example, a single SBus slot in a Sun SPARCstation occupies about 200cm 3 , can 
accommodate over 2000cm 2 of active silicon using 3D stacked multichip module technology, 
and has a power budget of 10W (see Fig 1). An architecture with a power density of 
2W/cm 2 and 40 MIPS per chip, typical of modern microprocessors, would dissipate 4KW 
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Figure 1: 3D MCM in an SBus slot: 2000 cm 2 , 10W max. Vdd = 0.7V permits 10 GIPS. 


if tiled over tjxe available area and achieve 80 billion operations per second. Only 5 cm 2 of 
silicon can be used at 10W, yielding 200 MIPS. If the supply voltage is lowered to 700mV, 
each chip would dissipate 5mW, and the entire 2000cm 5 could be used to achieve 10 billion 
operations per second at 10W. 


3 Background 

Low voltage digital logic is not new. Richard Swanson described a lOOmV CMOS ring 
oscillator in [6]. Eric Vittoz discussed subthreshold design techniques used in the digital 
watch industry in [4], Carver Mead described a variety of subthreshold analog circuits 
for neural networks in [1]. We believe that low voltage circuits can be used effectively for 
massively parallel computation in power constrained environments, and that lowering the 
voltage in submicron technologies has the added benefit of maintaining manageable signal 
frequencies at the system level. 


4 Transistor Current 

The following equations [6,7] describe drain current as a function of gate voltage, as shown 
in Fig 2. 
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Figure 2: Transistor current vs voltage. Current in exponential with voltage below V tl and 
quadratic above V t . 


threshold mismatch 



Figure 3: Model discontinuity at V a , = V t . The subthreshold model says Ij, = knV The 
saturation model says Ij t = |( V g , — V t ) J = 0. In the figure V t = 200mV. 
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subthreshold: V g , < V t ; J 0 = knVj. 

V gt -V t V A 

Id, = I 0 e~^r(l - e - ^) 

saturation: V t < V g , < V d , + V t 

h, = \{V a . - v t y 

linear: V d , + V t < V g , 

I d . = \{2{V gt - V t )V d , - VI) 

where V g , is the gate-source voltage, V t is the threshold voltage, I d . is the drain current, k 
is the transconductance in A/V J , n is the gate coupling coefficient, usually around 0.7, Vp 
is the thermal voltage, 0.026V, and J 0 is the current at V g , = V t . 

Note the exponential dependence of current on voltage below V t , and the quadratic 
dependence above Vj. These equations do a poor job of modeling behavior in the neigh- 
borhood of V t (see Fig 3). 

relative performance vs supply and threshold voltage 



Performance can be approximated when the supply voltage is over threshold by 

f = r/Q= \(v - v,)7(cv). 

where / is the clock frequency, k is transconductance, and C is the capacitance being 
switched. 


5 Optimum Logic Depth 
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We found the optimum logic depth in a 32 x 32 bit tree multiplier by reducing the supply 
voltage to keep the throughput constant (see Fig 5). We also found the area penalty using 
this approach (see Fig. 6). Id = 11 is close to the propagation delay through a 4:2 adder 
[ 2 ]- 


6 Minimum Energy 

The current available to switch a node is the difference between the current of the ON 
device and the leakage current of the OFF device. In standard CMOS, V t is so high that 
I 0 /f can be ignored, but in low voltage applications it can be an appreciable fraction of 
Ion* 

Q C g V C g V 

Pd I I Ion-Ioff 

Etc = IotfVl d i pd = C 9 V* - r k 

it 

E" = ioC.F’ 

E = E" + E± = lc,V\« + -^i- [ ) 

E is minimum when I m JI 0 ff is maximum. Referring to Fig 2, Ion/Ioff is maximum and 
constant in the subthreshold region. 

In the subthreshold region, if V d , — V = Vhi — Vi a , then I 0 n/Io]j — e^ Vhi ~ Vl ^^ nVT ^ — 
e v/(nV T )^ SQ £ <£ e p en( is only on V = V^ — Vi 0 . Therefore, for a given Vdd, energy is constant 
in the subthreshold region. For maximum performance at minimum energy, set VJ,, = V t 
and V lo = V t - V dd . 

DC energy rises exponentially as Vdd decreases. AC energy rises quadratically as Vdd 
increases. For optimum V t , 

P ac = aCV 2 f 
Pdc = IoffV 
Ion = UCVf 

If P ac = P dc and Vdd = V t , then 

Ion/ loft = ld/a = e v '« nV ') 

V t = nV T ln(J on /7 o// ) 

Figs 7 and 8 show energy vs Vdd. Table 1 lists the voltages and energies at the global 


minima. 
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7 Minimum Energy x Time 


l/(energy x time) vs supply and threshold voltage 



The minimum energy solution is quite slow. Performance should improve dramatically in 
deep submicron and with low voltage process optimizations. An alternative approach is to 
minimize energy x time. If we assume transistors operate mostly in saturation, then 

Et = V 2 Q/I = V 3 /(V - V t ) 2 
Et m in = \V*tV = W t 

Fig 9 shows a maximum at ZV t which grows much more pronounced at low voltage. 


8 Circuit Design Constraints 

A number of interesting circuit design constraints appear when leakage currents are large, 
and when the dependence of current on voltage is exponential. Three constraints we have 
observed to date: 

• Dynamic circuits are difficult to manage. A minimum size transistor will have a 
leakage current of about InA at V t = 160mV. A dynamic storage node with lOOfF 
of capacitance will hold 50fC of charge at Vdd=0.5V. A change of lOOmV requires 
movement of lOfC. lOfC/lnA = lOusec. 

• Exponential dependence of current on voltage makes pass transistor logic difficult to 
use. nfets cannot pass ones and pfets cannot pas zeros. In particular, using nfets as 
access transistors for static latches does not work. 


11111111 !BI 
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parameter 

negative 

positive 

reduce Xj 

increase Rd 

decrease 
cjsw 1 cgso } cgdo 

reduce T ox 

decrease V gs max 
(gate-src breakdown) 
increase C ox 
(increase energy) 

increase k 
decrease n 

reduce Nb 

decrease Vd,max 
(punchthrough) 

increase no 
decrease 
cj, cjsw,n 

reduce Nq 

increase Rq 

decrease V t 

reduce Nd 

increase JZ 5 , R& 

decrease 

cj,cjsw 


Table 2: Process optimization opportunities. 

• Fully static logic appears to work well. Transmission gate latches work nicely. SRAM 
seems to work well, since one of the bitlines will be pulling down on a write. 


9 Process Optimization 

The opportunity exists to improve performance by optimizing fabrication processes for 
low voltage operation. Carrier mobility degrades significantly in submicron processes as 
channel doping is increased to prevent punchthrough in the presence of strong electric 
fields. Reduced voltage operation results in weaker fields, permitting lower channel doping 
which results in higher carrier mobility and increased transconductance. 

Reduced voltage operation also permits lower diffusion doping, since higher diffusion 
resistance will not impact circuit performance due to reduced transistor drain current. 
This reduces diffusion capacitance to a negligible fraction of gate capacitance. The only 
drawback of reducing diffusion doping is that lateral diffusion is reduced, increasing the 
effective channel length. This is partially offset by the reduced Miller effect since the gate- 
drain overlap capacitance is reduced. Table 2 summarizes the impact of various process 
modifications on energy and performance. 

While a lower bound of 60mV/decade is achievable at room temperature ( dV = 
nVT'fn(lO) with n = 1), dV is more typically 80mV/decade in 2p CMOS and 90mV/decade 
in 0 . 8 /z CMOS. T ox /d 0 can be reduced by reducing Nb , since d 0 = yj2 e t i<f> t ,/{qNB), where 




114.10 


<j>„ = V T ln(Ng/n ,) and n; = X 10 16 [5]. 

Low gate, drain, and threshold voltages permit all doping concentrations to be reduced, 
once again due to lower electric field strength. This has two benefits for low voltage 
operation: 

1. n is reduced, decreasing the subthreshold slope and thus reducing the supply voltage 
(and therefore energy per operation) necessary to achieve the desired on/off current 
ratio. 

2. source/drain capacitances are reduced, further reducing energy per operation. 

10 Barriers to Practical Implementation 

A number of practiced considerations place a lower bound on supply voltage. These are: 
external interfacing, controlling device thresholds, maintaining adequate noise margins, 
power supply design, power consumption of OFF devices, and circuit speed. Multichip 
module packaging provides the opportunity to isolate low-voltage subsystems from other 
system components. Limits to low voltage operation may be determined to a large extent 
by the power dissipation in level-shifting interface circuits. Device thresholds have been 
observed to vary with transistor geometry and even location on a chip [3]. 

A 10 watt power supply will have to deliver 20amps at Vdd = 500mV. 


11 CIS Test chip 

In the BiCMOS process at Stanford’s Center for Integrated Systems, pfet gates are doped 
p+ and nfet gates are doped n+. This means that if the channel implant is excluded, 
both devices have thresholds close to zero volts. V t can then be adjusted by adjusting 
the substrate bias voltage. We have implemented a test chip which contains a number of 
simple circuit structures (see Fig 10), and will hopefully have some results in time for the 
conference. The chip has the following characteristics: 

• Pfet gates doped p+ have V t « OV^ 

• Independent substrate and well biases 

• self-testing convolutional coder 

• ring oscillator 

• VCO 

• single nfet, pfet, nand, latch 
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Figure 10: Ultra Low Power test chip. Separate bias voltages together with zero-V t pfets 
permit threshold adjustment. 

12 Conclusions 

Submicron CMOS, together with 3D stacked multichip modules, and massively parallel 
machines demand new approaches to power dissipation. We are in the very early stages of 
investigating reducing energy by reducing supply and thresholds voltages. We are hopeful 
that low voltage CMOS can find widespread use in performance driven, power constrained 
systems. 
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